Group 11: Analysis of Prostate Cancer Data

Introduction

Prostate cancer, a disease that significantly impacts the lives of men, requires a thoughtful exploration to refine diagnostic approaches and treatment regime. Understanding the complexities of this condition is crucial, emphasizing the importance of applying advanced data analysis techniques to patient data. Extracting meaningful insights from extensive data sets not only deepens our understanding of prostate cancer but also equips healthcare professionals with valuable information to personalize patient-by-patient Our goal is to delve into the connections, patterns, and predictive models associated with prostate cancer and patient outcomes, ultimately enhancing our ability to provide more personalized and effective care for individuals facing this issue.

Materials and Methods

We rigorously revised columns for clarity and consistency, changing text into numerical values for better analysis, when preparing the data for study. Replacing or removing missing values resulted in a more complete dataset. Splitting columns to keep single-value cells was also important. These procedures were designed to optimize the data, paving the way for more accurate and efficient analysis, resulting in sharper insights and informed decision-making.

Data exploration

Data exploration

Results: Logistic regression modelling

The objective is to predict dosage based on the bone metastasis(bm), weight index, primary lesion size and age adjusted haemoglobin for patients with 3 and 4 prostate cancer stage respectivelly. The coefficients of the above predictors are showed below as well as their significance as indicated by their p.values.

Results: Principal Component Analysis (PCA)

For Principal Component Analysis (PCA), three primary steps were undertaken. Initially, the data was examined in PC coordinates, followed by an analysis of the rotation matrix. Finally, emphasis was placed on understanding the variance explained by each Principal Component

Discussion

The PCA analysis is used to look for groupings and patterns of the data based on all the appropriate variables PCA analysis did not work particularly well for this data, since just over 25% of the total variance is captured by the first principal component (PC1) , which is pretty low. Some data sets inherently require multiple principal components to represent different aspects of the variability, as is the case here.